Challenges in Optimizing Job Scheduling on Mesos Alex Gaudio Who - - PowerPoint PPT Presentation

challenges in optimizing job scheduling on mesos
SMART_READER_LITE
LIVE PREVIEW

Challenges in Optimizing Job Scheduling on Mesos Alex Gaudio Who - - PowerPoint PPT Presentation

Challenges in Optimizing Job Scheduling on Mesos Alex Gaudio Who Am I? Data Scientist and Engineer at Sailthru Mesos User Creator of Relay.Mesos Who Am I? Data Scientist and Engineer at Sailthru Distributed Computation


slide-1
SLIDE 1

Challenges in Optimizing Job Scheduling on Mesos

Alex Gaudio

slide-2
SLIDE 2
slide-3
SLIDE 3
  • Data Scientist and Engineer at Sailthru
  • Mesos User
  • Creator of Relay.Mesos

Who Am I?

slide-4
SLIDE 4
  • Data Scientist and Engineer at Sailthru

○ Distributed Computation and Machine Learning

  • Mesos User

○ 1 year

  • Creator of Relay.Mesos

○ intelligently auto-scale Mesos tasks

Who Am I?

slide-5
SLIDE 5
slide-6
SLIDE 6

What are the goals of this talk?

  • 1. Understand the problem of job scheduling

using basic principles

slide-7
SLIDE 7

What are the goals of this talk?

  • 1. Understand the problem of job scheduling

using basic principles

  • 2. Learn ways to think about, use or develop

Mesos more effectively

slide-8
SLIDE 8

What are the goals of this talk?

  • 1. Understand the problem of job scheduling

using basic principles

  • 2. Learn how to think about and use or develop

Mesos more effectively

  • 3. Have some fun along the way!
slide-9
SLIDE 9
slide-10
SLIDE 10

Contents

  • The Problem of Utilization
  • How does Mesos do (or not do) Job Scheduling?
slide-11
SLIDE 11

The Problem of Utilization

Here’s a Box

slide-12
SLIDE 12

The Problem of Utilization

Length

Width

Height

It has 3 dimensions

slide-13
SLIDE 13

What can you do with a box that has 3 dimensions?

slide-14
SLIDE 14

What does this mean?!

slide-15
SLIDE 15

The Problem of Utilization

Stuff the box

slide-16
SLIDE 16

The Problem of Utilization

Unpack the box

slide-17
SLIDE 17

The Problem of Utilization

Box in a box

slide-18
SLIDE 18

The Problem of Utilization

Carry the box

slide-19
SLIDE 19

The Problem of Utilization

slide-20
SLIDE 20

The Problem of Utilization

Is really … All about the box!

slide-21
SLIDE 21

The Problem of Utilization

By Example: Please efficiently pack these stolen boxes into my get- away car!

slide-22
SLIDE 22

The Problem of Utilization

slide-23
SLIDE 23

The Problem of Utilization

Box Computer A Computer is really just a Box

slide-24
SLIDE 24

The Problem of Utilization

Height Length

Width We can represent a box with 3 dimensions

slide-25
SLIDE 25

The Problem of Utilization

RAM CPU

Disk … If we relabel the dimensions

slide-26
SLIDE 26

The Problem of Utilization

RAM C P U

Disk A computer, like a box, is a multi- dimensional object.

slide-27
SLIDE 27

The Problem of Utilization

RAM C P U

Disk A computer, is just a collection of resources

slide-28
SLIDE 28

If we put things in boxes, What can we put in

  • ur computer?

The Problem of Utilization

slide-29
SLIDE 29

What can we put in

  • ur computer?

The Problem of Utilization

Processes!

slide-30
SLIDE 30

Output of a computer’s Process Tree

$ pstree

slide-31
SLIDE 31

This is an interesting slide!

$ pstree

slide-32
SLIDE 32

Why is the pstree slide interesting?

  • 1. It introduces the concept of a process.

A process is an instance of code that accesses resources

  • ver time.
slide-33
SLIDE 33

Why is the pstree slide interesting?

  • 1. It introduces the concept of a process.

A process may use, share, steal, lock or release resources

slide-34
SLIDE 34

Why is the pstree slide interesting?

  • 2. It shows a computer with multiple processes running on it.
slide-35
SLIDE 35

Why is the pstree slide interesting?

  • 2. It shows a computer with multiple processes running on it.
  • The processes access the same pool of resources.
slide-36
SLIDE 36

Why is the pstree slide interesting?

  • 2. It shows a computer with multiple processes running on it.
  • Shared access to same pool of resources.
  • Processes are categorized into a hierarchical structure.
slide-37
SLIDE 37

At this point, we can ask a couple great questions!

slide-38
SLIDE 38

At this point, we can ask a couple great questions!

  • Why don't computers just have 1 process per

box?

slide-39
SLIDE 39

At this point, we can ask a couple great questions!

  • Why don't computers just have 1 process per

box?

  • Is it inefficient to have so many processes on
  • ne box?
slide-40
SLIDE 40

At this point, we can ask a couple great questions!

  • Why don't computers just have 1 process per

box?

  • Is it inefficient to have so many processes on
  • ne box?
  • Aren’t processes just another kind of box?
slide-41
SLIDE 41

The Problem of Utilization

Let’s try to answer these questions!

slide-42
SLIDE 42

The Problem of Utilization

←------------------> CPU <--------------------> ←-------------------> RAM ←------------------->

Imagine a computer with only 2 resources.

slide-43
SLIDE 43

The Problem of Utilization

Process 1 Process 2 Process 3 ←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

Imagine a computer with only 2 resources. Only 3 distinct process types run on this computer

slide-44
SLIDE 44

The Problem of Utilization

←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

There is a fixed number of ways we can use up the computer’s resources.

Process 2

slide-45
SLIDE 45

The Problem of Utilization

←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

There is a fixed number of ways we can use up the computer’s resources.

Process 2

1 process at a time.

Could be great if all processes were the size of the computer

slide-46
SLIDE 46

The Problem of Utilization

←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

There is a fixed number of ways we can use up the computer’s resources.

Process 2

2+ processes

Sharing resources New Concept: Shared State

Process 3

slide-47
SLIDE 47

The Problem of Utilization

←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

Different Utilization Strategies

Process 3 Process 1 Process 2

Maximum Variation Under-utilized

slide-48
SLIDE 48

The Problem of Utilization

Process 1 ←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

There is a fixed number of ways we can use up the computer’s resources.

Process 1 Process 1 Process 1 Process 1 Process 1

Maximum Utilization No Variation

slide-49
SLIDE 49

The Problem of Utilization

←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

There is a fixed number of ways we can use up the computer’s resources.

Process 3 Process 3 Process 3 Process 3

Over-provisioned and Under-utilized

slide-50
SLIDE 50

The Problem of Utilization

←----------------> CPU Time <------------------> ←---------------------> RAM ←--------------------->

Competing for shared

  • resources. Unclear

consequences.

Process 3 Process 3 Process 3 Process 3

Over-provisioned and Under-utilized

slide-51
SLIDE 51

A multi-dimensional problem! And very complicated!

The Problem of Utilization

slide-52
SLIDE 52

Many ways we can use a computer’s resources. Many different factors inform how we choose to utilize a set of resources.

Take-Aways

slide-53
SLIDE 53

Benefits of Shared State

  • increased utilization
  • flexibility to do different things simultaneously
  • exposes a lot of interesting problems to solve
slide-54
SLIDE 54

Drawbacks of Shared State

  • resource competition

○ network and io congestion ○ context switching ○ out of memory errors

  • less predictable

○ constantly changing dynamic systems ○ non-deterministic waiting ○ feedback loops

slide-55
SLIDE 55

One machine, a host of problems

  • Operating systems are complicated!
  • Your laptop’s kernel solves these scheduling

problems well.

slide-56
SLIDE 56
slide-57
SLIDE 57
  • Thus far, we’ve

discussed resource utilization on 1 machine.

  • Is 1 machine enough?
  • And what about Mesos?

The Problem of Utilization

slide-58
SLIDE 58

Obviously, 1 machine isn’t enough

  • Problems of scale:

○ Too much data ○ Not enough compute power ○ Everything can’t connect to 1 node

  • Problems of reliability and availability:

○ 1 machine is a Single Point of Failure ○ No redundancy

slide-59
SLIDE 59

Many machines, then?

slide-60
SLIDE 60

Mesos!

slide-61
SLIDE 61

Recall the Box...

Box Computer A Computer is really just a Box

slide-62
SLIDE 62

Mesos is really just a box, too

slide-63
SLIDE 63

AND Mesos is just a Computer

Double Analogy

slide-64
SLIDE 64

Mesos is a Distributed Computer

RAM CPU

slide-65
SLIDE 65

Mesos is a Distributed Computer

RAM CPU

  • a lot of machines
  • all solving the similar

problems

slide-66
SLIDE 66

Mesos is a Distributed Computer

RAM CPU

  • a lot of machines
  • all solving the similar

problems

  • We need ways to tell each

machine what to do.

slide-67
SLIDE 67

Must rebuild all elements of an operating system in context of a distributed system!

slide-68
SLIDE 68

Must rebuild all elements of an operating system in context of a distributed system!

Same old problems Awesome new technology

slide-69
SLIDE 69

Part 2:

slide-70
SLIDE 70

Part 2: How does Mesos do Job Scheduling?

slide-71
SLIDE 71

How Mesos does Job Scheduling

A very big box Let’s call it “Grid”

slide-72
SLIDE 72

How Mesos does Job Scheduling

Mesos Slaves (aka computers or boxes) The “Grid” holds a lot

  • f smaller boxes.

The little boxes are “Slaves”

slide-73
SLIDE 73

How Mesos does Job Scheduling

Mesos Slaves Each slave is a partitioned pool of resources RAM CPU

slide-74
SLIDE 74

How Mesos does Job Scheduling

Mesos Slaves Mesos Master

  • Slaves advertise resources to

Master

  • Master packages resources

into resource offers.

slide-75
SLIDE 75

How Mesos does Job Scheduling

Mesos Slaves Mesos Master Frameworks Master offers resources to frameworks

slide-76
SLIDE 76

How Mesos does Job Scheduling

Mesos Slaves Mesos Master Frameworks Frameworks accept or reject resource offers.

slide-77
SLIDE 77

How Mesos does Job Scheduling

Mesos Slaves Mesos Master

Frameworks

Accepted offers result in tasks that do useful work.

slide-78
SLIDE 78

3 Types of Scheduling Architectures

(aka 3 Types of Distributed Kernels) Mesos has a two-level architecture.

slide-79
SLIDE 79

3 Types of Scheduling Architectures

from the Google Omega Whitepaper

Mesos Master

(manage resource and framework state)

Mesos Frameworks

(manage task state)

slide-80
SLIDE 80

3 Types of Scheduling Architectures

from the Google Omega Whitepaper

slide-81
SLIDE 81

3 Types of Scheduling Architectures

(aka 3 Types of Distributed Kernels)

Goal

slide-82
SLIDE 82

3 Types of Scheduling Architectures

(aka 3 Types of Distributed Kernels)

slide-83
SLIDE 83

3 Types of Scheduling Architectures

(aka 3 Types of Distributed Kernels)

Borg (Google)

slide-84
SLIDE 84

Remainder of this talk...

Point out weaknesses with Mesos that

  • 1. Prevent it from being a shared state kernel.
  • 2. Can make Mesos challenging to use.
slide-85
SLIDE 85

Remainder of this talk...

  • 1. Optimistic Vs Pessimistic Offers
  • 2. DRF Algorithm and Framework Sorters
  • 3. Missing APIs / Enhancements
slide-86
SLIDE 86

Optimistic Vs Pessimistic Offers

We Trust Everyone!

slide-87
SLIDE 87

Optimistic Vs Pessimistic Offers

Everyone promised not to take my spot Protect my spot from thiefs!

slide-88
SLIDE 88

Optimistic Vs Pessimistic Offers

slide-89
SLIDE 89

Optimistic Vs Pessimistic Offers

  • 2 frameworks sharing the

same resources is not safe

slide-90
SLIDE 90

Optimistic Vs Pessimistic Offers

  • 2 frameworks sharing the

same resources is not safe

  • A chunk of resources is
  • nly offered to a single

framework scheduler at a time.

slide-91
SLIDE 91

Why is this a problem?

When a Framework receives resource offers, it has 2 options: Make an immediate decision Hold onto the

  • ffer forever in

a state of indecision

slide-92
SLIDE 92

Why is this a problem?

When a Framework receives resource offers, it has 2 options: Make an immediate decision Hold onto the

  • ffer forever in

a state of indecision

slide-93
SLIDE 93

Why is this a problem?

Under-utilization If the framework holds the offer forever, those resources can’t be used. … or eaten!

slide-94
SLIDE 94

Why is this a problem?

Under-utilization Can be hard to schedule large tasks

slide-95
SLIDE 95

Why is this a problem?

Gaming the System If it’s hard to schedule large tasks, frameworks might hold onto tons

  • f offers until it can schedule its

huge task.

slide-96
SLIDE 96

Why is this a problem?

Gaming the System: One could create many instances of a framework to trick Mesos to let it hoard more offers!

slide-97
SLIDE 97

Workarounds / Solutions

  • --offer_timeout Set short timeouts to penalize

slow frameworks

  • MESOS-1607: Wait for optimistic offers!

○ Submit one offer to multiple frameworks, but rescind the offer when necessary. ○ Encourages more sophisticated allocation algorithms

slide-98
SLIDE 98

Remainder of this talk...

  • 1. Optimistic Vs Pessimistic Offers
  • 2. DRF Algorithm and Framework Sorter
  • 3. Missing APIs / Enhancements
slide-99
SLIDE 99

DRF and Framework Sorter

slide-100
SLIDE 100

DRF and Framework Sorter

Mesos Master must choose which Frameworks to give

  • ffers to first.
slide-101
SLIDE 101

DRF and Framework Sorter

Mesos Master must choose which Frameworks to give

  • ffers to first.

In a pessimistic system, this is very important!

slide-102
SLIDE 102

What is DRF?

“Dominant Resource Fairness” Algorithm

slide-103
SLIDE 103

What is DRF?

“Dominant Resource Fairness” Algorithm

  • A method for prioritizing which frameworks

to give a resource offer to first.

slide-104
SLIDE 104

What is DRF?

“Dominant Resource Fairness” Algorithm

Framework XYZ Resource Usage

12% 30% 3% 7%

We can represent a framework by how many resources it uses.

slide-105
SLIDE 105

What is DRF?

“Dominant Resource Fairness” Algorithm

Framework XYZ Resource Usage

12% 30% 3% 7%

We can represent a framework by how many resources it uses. For example:

  • 30% of total RAM
  • 12% of total CPU
slide-106
SLIDE 106

What is DRF?

“Dominant Resource Fairness” Algorithm

Framework XYZ Resource Usage

12% 30% 3% 7%

Framework XYZ’s Dominant Resource is the 30% RAM

slide-107
SLIDE 107

How does DRF work?

“Dominant Resource Fairness” Algorithm

30% F1 10% F2 20% F3

Identify all frameworks by their dominant resource

slide-108
SLIDE 108

How does DRF work?

“Dominant Resource Fairness” Algorithm

30% F1 10% F2 20% F3

Out of all frameworks (F1, F2 and F3), F2 has the minimum dominant share of resources.

slide-109
SLIDE 109

How does DRF work?

“Dominant Resource Fairness” Algorithm

F2 DRF says that as long as resources are available, Mesos should offer resources to F2 first, F3 second, and F1 last. F3 F1

slide-110
SLIDE 110

How does DRF work?

Weighted DRF

30% F1 10% F2 20% F3

Per-framework weights, if defined, adjust the dominant share for each framework.

F1 F2 F3

slide-111
SLIDE 111

How does DRF work?

Weighted DRF

30% F1 10% F2 20% F3

Per-framework weights, if defined, adjust the dominant share for each framework. Weighting informs Mesos that it should generally prefer some Frameworks over others.

F1 F2 F3

slide-112
SLIDE 112

DRF is great if...

slide-113
SLIDE 113

DRF is great if...

  • All frameworks have work

to do

slide-114
SLIDE 114

DRF is great if...

  • All frameworks have work

to do

  • A framework’s “hunger” for

more resources does not change over its lifetime

slide-115
SLIDE 115

DRF is great if...

  • All frameworks have work

to do

  • A framework’s “hunger” for

more resources does not change over its lifetime

  • You know apriori that

specific frameworks to use more or less resources

slide-116
SLIDE 116

DRF is bad if...

slide-117
SLIDE 117

DRF is bad if...

  • Some frameworks don’t want

any more tasks, while others do.

slide-118
SLIDE 118

DRF is bad if...

  • Some frameworks don’t want

any more tasks, while others do.

  • The framework's "hunger" for

resources changes over its lifetime (perhaps based on queue size or pending web requests)

slide-119
SLIDE 119

DRF Examples

Framework 1

1 task

Framework 2

6 tasks

Framework 4

30 tasks

Framework 6

50 tasks

Framework 3

1 task

Framework 5

1 task

Framework 4 always wants 30 tasks

slide-120
SLIDE 120

DRF Examples

Framework 1

1 task

Framework 2

6 tasks

Framework 4

30 tasks

Framework 6

50 tasks

Framework 3

1 task

Framework 5

1 task

DRF with weights is great IF these expected ratios never change. Framework 4 always wants 30 tasks

slide-121
SLIDE 121

DRF Examples

Framework 1

0 tasks

Framework 2

0 tasks

Framework 4

0 tasks

Framework 6

50 tasks

Framework 3

0 task

Framework 5

1 task

Sometimes frameworks don’t want to do work

slide-122
SLIDE 122

DRF Examples

Framework 1

0 tasks

Framework 2

0 tasks

Framework 4

0 tasks

Framework 6

50 tasks

Framework 3

0 task

Framework 5

1 task

Sometimes frameworks don’t want to do work

  • DRF gives preference

to the “0 tasks” frameworks.

  • Framework 6 gets

starved for resources!

slide-123
SLIDE 123

DRF Examples

Framework 1

0 tasks

Framework 2

0 tasks

Framework 4

0 tasks

Framework 6

50 tasks

Framework 3

0 task

Framework 5

1 task

Sometimes frameworks don’t want to do work

  • DRF gives preference

to the “0 tasks” frameworks.

  • Framework 6 gets

starved for resources!

slide-124
SLIDE 124

Real-world Examples of Bad DRF

Any Framework that declines usable offers suggests DRF isn’t working well

  • Consumer Framework that consumes an occasionally

empty queue

  • Web Server Framework that sometimes doesn’t get a lot of

requests

  • Database Framework that doesn’t have a lot to do

sometimes

slide-125
SLIDE 125

Workarounds / Solutions

  • Ensure all your frameworks always want

more tasks

○ Can be very hard, perhaps impossible, to do. ○ ie. What if a framework just maintains N services? ○ Might encourage sloppy or inefficient frameworks.

slide-126
SLIDE 126

Workarounds / Solutions

  • Write your own allocation algorithm!

○ See Li Jin’s 11:50 talk, "Preemptive Task Scheduling in Mesos Framework" ○ Maybe other talks?

slide-127
SLIDE 127
  • wait for optimistic offers to make this less of

an issue

  • allow frameworks to periodically restart

themselves and define a different DRF weighting every time they restart

Workarounds / Solutions

slide-128
SLIDE 128

DRF Speculation

  • A really good dynamic weighting algorithm

would benefit by knowledge of the current distribution of weights by other frameworks across the system.

○ Frameworks could compete with each other based

  • n this information

○ Makes Mesos more like a shared-state scheduler

slide-129
SLIDE 129

Remainder of this talk...

  • 1. Optimistic Vs Pessimistic Offers
  • 2. DRF Algorithm and Framework Sorter
  • 3. Missing APIs / Enhancements
slide-130
SLIDE 130
slide-131
SLIDE 131

These are my opinions Not sure whether others will agree If you have opinions too, let’s get beers tonight!

slide-132
SLIDE 132

Missing APIs / Enhancements

  • In my opinion, different framework sorter

algorithms and even optimistic offers, will only take us so far.

slide-133
SLIDE 133

Missing APIs / Enhancements

  • Frameworks should more actively leverage

statistics about resource utilization to inform mesos master about how it should be allocated.

slide-134
SLIDE 134

Missing APIs / Enhancements

  • Frameworks should more actively leverage

statistics about resource utilization to inform mesos master about how it should be allocated.

○ Frameworks know their resource needs better than the Master. ○ Some frameworks can make simple decisions ○ Others can be smart in how they wish to populate the grid

slide-135
SLIDE 135

Missing APIs / Enhancements

  • Frameworks should be able to tell mesos what

they will want in the future (and how badly they want it)

○ Let the framework developer community play the game to “optimize this scheduling problem”

  • The DRF algorithm, or hierarchical allocator in

general, should leverage historical data.

slide-136
SLIDE 136

For more about our story, check out this talk at 4:50!

slide-137
SLIDE 137