Scheduling Shared Continuous Resources On Many-Cores PRESENTER - - PowerPoint PPT Presentation

scheduling shared continuous resources on many cores
SMART_READER_LITE
LIVE PREVIEW

Scheduling Shared Continuous Resources On Many-Cores PRESENTER - - PowerPoint PPT Presentation

Scheduling Shared Continuous Resources On Many-Cores PRESENTER PRESENTER PRESENTER PRESENTER: LIOR BELINSKY AUTHORS AUTHORS: ANDRE BRINKMANN - PETER KLING EQ - TIM SUB AUTHORS AUTHORS UUUJJJJJIIILARS NAGEL UUUUU - SORE RIECHERSIU


slide-1
SLIDE 1

Scheduling Shared Continuous Resources On Many-Cores

PRESENTER PRESENTER PRESENTER PRESENTER: LIOR BELINSKY AUTHORS AUTHORS AUTHORS AUTHORS: ANDRE BRINKMANN - PETER KLING EQ - TIM SUB UUUJJJJJIIILARS NAGEL UUUUU - SORE RIECHERSIU

  • FRIEDHELM MEYER AUF DER HEIDE

1

slide-2
SLIDE 2

Presentation Contents

2

First Part: Review the process scheduling problem [~30 minutes] Second Part: Algorithms and Approximations [~20 minutes]

slide-3
SLIDE 3

First Part

Introduction To the CRSHARING Problem Motivation & Usage Hyper Graph Representation Complexity Of The Problem (NPC)

3

slide-4
SLIDE 4

Continue Resource Sharing (CRSHARING)

We consider the problem of scheduling a number of jobs on identical processors sharing a continuously divisible resource. Time is considered discrete and separated by time steps. At every time step the scheduler distributes the resource among the processors. Each processor assigned a share of the resource it can use at time step . For each processor there is a sequence of jobs to process in the given order. the -th job on processor will be denoted as

4

slide-5
SLIDE 5

Continue Resource Sharing (CRSHARING)

5

Consider a job whose processing started at time step The job arrives with resource requirement

and a process volume (size)

The job is granted with a share of the resource, and thus

  • units of are processed at time step

Therefore, after time step finishes, the remaining processing volume is:

  • The job is finished at the minimal time step ! with:

"

#$ % &$' ! %( %)%*

slide-6
SLIDE 6

How Does A Solution Looks Like

Goal

finding a resource assignment to processors that minimize the makespan, i.e. the +,-

. +/0

  • 1234. 567869:8

;; Feasible Solution

A schedule consist of resource assignment functions < = that specify the resource’s distributions among the processors for all time steps, without overusing the resource.

In other Words

Our goal is to find a feasible schedule (solution) having a minimal makespan.

6

slide-7
SLIDE 7

Scheduler Limitations

System Resource limit System Resource limit System Resource limit System Resource limit -

  • ∀ : "

K

L )

Per job Per job Per job Per job Resource limit Resource limit Resource limit Resource limit -

  • ∀ : the resource share of each

job is capped by

  • Observation 1 – any feasible schedule for our problem needs at least

" "

  • R$

) L )

time steps to finish a given set of jobs.

7

slide-8
SLIDE 8

CRSHARING Simplified Model

8

Consider a job whose processing started at time step The job arrives with resource requirement

and a process volume

The job is granted with a share of the resource, and thus

  • units are processed at time step

Therefore, after time step finishes, the remained processing volume is:

  • The job is finished at the minimal time step ! with:

"

#$ % &$' ! %( %)%*

slide-9
SLIDE 9

Motivation & Usage

Exceed computational performance

Devices or energy consumption are not the only bottleneck of a computation. Distribution of the bandwidth (resource) shared by processors can speed up the computation.

Usage Examples

Many-core systems - chip’s cores share a single data bus to the outside UUUUUUUUUUUUUUIworld. Virtual systems – different virtual machines share a single divisible U UUUUUUUUU resource of a given host system.

9

slide-10
SLIDE 10

Example For a Better Understanding

10

The problem is kind of similar to running machines at the gym

slide-11
SLIDE 11

Additional Terms & Notions

11

Meaning Term/Notation

The -th job on processor , ) The share of resource granted to processor at time step The number of jobs that will be processed by processor

  • The number of unfinished jobs in processor at the start of time

() Job (, ) is active in time step if − = − 1 Active job Processoris active in time step if > 0 Active processor The set of all processors having at least j jobs to process S

≔ {| ≥ }

slide-12
SLIDE 12

Graphical Representation

HyperGraph V = (W X consist of a finite set Wof vertices, and edges X which iiis a non-empty subset of V. For example: W = Y YZ [ Y\ X 8 8Z 8] 8^

8 Y YZ Y] 8Z YZ Y] 8] Y] Y_ Y` 8^ .Y^;

12

slide-13
SLIDE 13

Model’s HyperGraph Representation

Given a problem instance of CRSHARING with unit size jobs and corresponding iischedule a, we define a weighted HyperGraph V4 = W X named the iischeduling graph of a: W = b ; X 8 8Z [ 84 J a the edge 8% c d is defined as follows: 8% T b ; J W e8fg

  • 13

W= Jobs X= Time steps 8% Active jobs at time

slide-14
SLIDE 14

Scheduling Graph Of S - Illustration

14

Processor 1 Processor 2 Processor 3 W= jobs X= time steps 8% =active jobs at time

slide-15
SLIDE 15

Connected Components

The connected components formed by the edges of scheduling graph V4 carry iiia lot of information about the schedule

15

slide-16
SLIDE 16

Connected Components

16

Meaning Notation

Number of connected components h The i-th connected component (i ∈ [h]) 5j The number of edges of the i-th component #j The size of the first edge in the i-th component Component class lj

slide-17
SLIDE 17

Connected Components

Observation 2 – consider a connected component 5 c d of V4 and two time iiisteps K Z with 8%( m 8%n c o. then for all . [ Z; we have 8% c o.

17

slide-18
SLIDE 18

CRSHARING Complexity

Theorem – CRSHARING with jobs of unit size is NP-hard uuuuuuuuuu if the number of processors is part of the input. PROOF Highlight Reduction from the PARTITION problem

18

processors 3 jobs on each processor elements PARTITION CRSHARING Reduction Open Question For a constant ≥ p , Does the CRSHARING remain NP-hard?

slide-19
SLIDE 19

Second Part

Round Robin Approximation Unique properties expected of a feasible schedule (solution) Algorithm for 2 processors A (q −

L) approximation for processors

19

slide-20
SLIDE 20

Round Robin Approximation

20

Let be the maximal number of jobs on a processor. The algorithm operates in phases s.t. during phase it processes the -th job on any Uremained processor.

Theorem - The RoundRobin algorithm for the CRSharing problem with unit sized jobs has a

UUUUUUUUIworst-case approximation ratio of exactly q

PROOF

The -th phase requires exactly time steps. Hence all phases last "

r

R ) K " "

  • s'

R )

K tu: tu: qtu:

v

s'

v

w

slide-21
SLIDE 21

Structural Properties

Each Reasonable Schedule for the CRSHARING problem should have the following basic properties: sNon-Wasting – finishes all active jobs during every time step with "

x

L )

Progressive – among all jobs that are assigned resources, at most one job is only partially processed during any time step y more formally: J , bCz A UK Balanced – whenever a processor finishes a job at time step iany processor { with

| > does also finish a job.

21

slide-22
SLIDE 22

Non-Wasting & Progressive Schedules

22

Non-Wasting – finishes all active jobs during every time step with i"

x

L )

Progressive – among all jobs that are assigned resources, at most one job is

  • nly partially processed during any time step y

more formally: J , bCz A UK Given an arbitrary schedule awe can transform it into a non-wasting and progressive schedule a{ with a| K ay Moreover, the resulting schedule a{ finishes at least one job per time step

slide-23
SLIDE 23

Balanced Schedules

23

For every balanced schedule, 2 processors with ≥ Z and for all we have : -zZ A K K -zZ A -z -zZ

Balanced – whenever a processor finishes a job at time step I any processor { with

| > does also finish a job.

slide-24
SLIDE 24

Balanced Schedules

Lemma Consider a non-wasting, progressive, and balanced schedule. The number of vertexes and edges in a component are related via the following properties: (a) The inequality |5j| ≥ kj lj holds for all k ∈ { 1, 2, . . . ,N − 1 } (b) The last component satisfies |5}| ≥ k}

24

slide-25
SLIDE 25

Lemma Proof (a)

25

The inequality |5j| ≥ kj lj holds for all k ∈ { 1, 2, . . . ,N − 1 }

Let i ∈ ~ − , and let 8

∈ X be the first edge in 5j, i.e.

J8%• c 5j we have K |

Thus, 8

= lj

For all the other edges in 5j, there is at least one new vertex in every edge, since every time step, at least

  • ne job must finish.

Therefore:

First row

lj

5j ≥ Anything but First row

+ k€

slide-26
SLIDE 26

Lemma Proof (b)

26

The last component satisfies |5}| ≥ k}

Every time step at least one job must finish. Each edge represents all the active jobs in a specific time step. a component contains at least 1 edge (time step). Therefore:

5} ! k}

slide-27
SLIDE 27

Algorithm For 2 Processors

OptResAssignment algorithm uses a dynamic programming approach. Let • be a two-dimensional array of size ‚Z

  • Z states that there is a schedule that, at time step , asd

IIIhas finished all jobs and q Z for x and Z x Zy Ss ui will be the remaining total resource requirement of and Z, i.e.

S

sF Fz( A Fzn A Iƒ„…†‡ < •

n

27

Invariant

At phase ℓ, all entries on the (ℓ−1)-th diagonal (i.e., all B[ , Z] with + Z = ℓ) correspond to subschedules with minimal (and, for this , minimal ) reaching the jobs (1, ) and (2, Z)

Output

slide-28
SLIDE 28

Analysis

Runtime - t Z

O(n) phases – B has Z − diagonals. Each phase considers the t entries on the corresponding diagonal.

28

slide-29
SLIDE 29

Correctness

we’ll prove the invariant’s correctness with induction on the phase number: Base: at the first phase •

n , as there are no jobs preceding

q y Step: Assume the invariant holds for the first ℓ phases, and consider an entry

  • [ Z iprocessed in ithe ℓ+1-th phase. This entry corresponds to a subschedule

that has iprocessed all jobs preceding ( and q Z y Since each processor can finish, at most, one job per time step, this subschedule must originate from one of the following subschedules:

29

slide-30
SLIDE 30

L Approximation

30

Consider a CRSHARING instance and a feasible schedule a for it that is non-wasting, progressive and balanced. Then a is a q −

L approximation with respect to the optimal makespan

Let kˆdenote the average number of edges in a component. Our proof uses two bounds on the approximation ratio:

slide-31
SLIDE 31

Tight Approximation Algorithm

GreedyBalance is an example for a greedy algorithm for balanced schedule. the schedule priorities processors with a higher number of remaining jobs, and in case of a tie, prioritizing jobs with larger remaining resource requirement.

31

The GreedyBallance algorithm for the CRSHARING problem has a worst-case approximation ratio of exactly q −

L

slide-32
SLIDE 32

Tight Approximation Algorithm

32

slide-33
SLIDE 33

Summary & Outlook

What have we seen

New resource scheduling problem, where job processing depends on the share of the resource a job is assigned. Efficient optimal algorithm for 2 processors. Approximation Algorithm with a worst-case approximation ratio of q −

L for processors.

Authors Outlook

Believes that It is possible to find some analytical results with arbitrary job sizes.

33

slide-34
SLIDE 34

Before We Finish

34

slide-35
SLIDE 35

35