CooRM v2: An RMS with Support for Non-predictably Evolving - - PowerPoint PPT Presentation

coorm v2 an rms with support for non predictably evolving
SMART_READER_LITE
LIVE PREVIEW

CooRM v2: An RMS with Support for Non-predictably Evolving - - PowerPoint PPT Presentation

. . . . . . CooRM v2: An RMS with Support for Non-predictably Evolving Applications Cristian KLEIN, Christian PREZ Avalon / GRAAL, INRIA / LIP, ENS de Lyon Scheduling Workshop, May 29June 1, 2011, Aussois Cristian KLEIN (INRIA) CooRM


slide-1
SLIDE 1

. . . . . .

CooRMv2: An RMS with Support for Non-predictably Evolving Applications

Cristian KLEIN, Christian PÉREZ

Avalon / GRAAL, INRIA / LIP, ENS de Lyon

Scheduling Workshop, May 29–June 1, 2011, Aussois

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 1 / 19

slide-2
SLIDE 2

. . . . . .

Adaptive Mesh Refinement Applications (AMR)

. . . . . . . Mesh is dynamically refined / coarsened as required by numerical precision

◮ Memory requirements increase / decrease ◮ Amount of parallelism increases / decreases

Generally evolves non-predictably . . . . . . .

100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Normalized data size Step number 1 10 100 1 4 16 64 256 1k 4k 16k Duration of a step (s) Number of nodes

3136 GiB 784 GiB 196 GiB 48 GiB 12 GiB

. . . . . . . Goal: maintain a given target efficiency

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 2 / 19

slide-3
SLIDE 3

. . . . . .

Executing AMR applications on HPC resources (1/2)

.

Use static allocations (rigid jobs)

. . . . . . . . E.g., cluster, supercomputing batch schedulers Evolution is not known in advance → User is forced to over-allocate → Inefficient resource usage Example: target efficiency 75% (±10%)

1/8 1/4 1/2 1 2 4 8 1000 2000 3000 4000 5000 Data Size (Relative) Number of nodes

Ideally, unused resources should be filled by other applications

◮ Needs support from the Resource Management System (RMS) Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 3 / 19

slide-4
SLIDE 4

. . . . . .

Executing AMR applications on HPC resources (2/2)

.

Use dynamic allocations

. . . . . . . . Malleable jobs: RMS tells applications to grow/shrink Clouds “The illusion of infinite computing resources available on demand”

Infinite? Actually up to 20 Even without this limit: “Out of capacity” errors Application may run out-of-memory

Ideally, RMS guarantees the availability of resources to an AMR application?

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 4 / 19

slide-5
SLIDE 5

. . . . . .

Executing AMR applications on HPC resources (2/2)

.

Use dynamic allocations

. . . . . . . . Malleable jobs: RMS tells applications to grow/shrink Clouds “The illusion of infinite computing resources available on demand”

Infinite? Actually up to 20 Even without this limit: “Out of capacity” errors Application may run out-of-memory

Ideally, RMS guarantees the availability of resources to an AMR application?

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 4 / 19

slide-6
SLIDE 6

. . . . . .

Executing AMR applications on HPC resources (2/2)

.

Use dynamic allocations

. . . . . . . . Malleable jobs: RMS tells applications to grow/shrink Clouds “The illusion of infinite computing resources available on demand”

◮ Infinite? Actually up to 20

Even without this limit: “Out of capacity” errors Application may run out-of-memory

Ideally, RMS guarantees the availability of resources to an AMR application?

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 4 / 19

slide-7
SLIDE 7

. . . . . .

Executing AMR applications on HPC resources (2/2)

.

Use dynamic allocations

. . . . . . . . Malleable jobs: RMS tells applications to grow/shrink Clouds “The illusion of infinite computing resources available on demand”

◮ Infinite? Actually up to 20 ◮ Even without this limit: “Out of capacity” errors

→ Application may run out-of-memory

Ideally, RMS guarantees the availability of resources to an AMR application?

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 4 / 19

slide-8
SLIDE 8

. . . . . .

Executing AMR applications on HPC resources (2/2)

.

Use dynamic allocations

. . . . . . . . Malleable jobs: RMS tells applications to grow/shrink Clouds “The illusion of infinite computing resources available on demand”

◮ Infinite? Actually up to 20 ◮ Even without this limit: “Out of capacity” errors

→ Application may run out-of-memory

Ideally, RMS guarantees the availability of resources to an AMR application?

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 4 / 19

slide-9
SLIDE 9

. . . . . .

Problem

. . . . . . . A Resource Management System (RMS) which allows non-predictably evolving applications To use resources efficiently Guarantee the availability of resources

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 5 / 19

slide-10
SLIDE 10

. . . . . .

. ..

1

Introduction . ..

2

CooRMv2 Resource Requests High-level Operations Views Scheduling Algorithm . ..

3

Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . ..

4

Results . ..

5

Conclusions

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 6 / 19

slide-11
SLIDE 11

. . . . . .

. ..

1

Introduction . ..

2

CooRMv2 Resource Requests High-level Operations Views Scheduling Algorithm . ..

3

Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . ..

4

Results . ..

5

Conclusions

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 7 / 19

slide-12
SLIDE 12

. . . . . .

Resource Requests

. . . . . . . Cluster ID, number of nodes, duration RMS chooses start time → node IDs are allocated to the application Type

Non-preemptible (default in major RMSs) Preemptible (think OAR best-effort jobs) Pre-allocation “I do not currently need these resources, but make sure I can get them immediately if I need them.”

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 8 / 19

slide-13
SLIDE 13

. . . . . .

Resource Requests

. . . . . . . Cluster ID, number of nodes, duration RMS chooses start time → node IDs are allocated to the application Type

◮ Non-preemptible (default in major RMSs)

Preemptible (think OAR best-effort jobs) Pre-allocation “I do not currently need these resources, but make sure I can get them immediately if I need them.”

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 8 / 19

slide-14
SLIDE 14

. . . . . .

Resource Requests

. . . . . . . Cluster ID, number of nodes, duration RMS chooses start time → node IDs are allocated to the application Type

◮ Non-preemptible (default in major RMSs) ◮ Preemptible (think OAR best-effort jobs)

Pre-allocation “I do not currently need these resources, but make sure I can get them immediately if I need them.”

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 8 / 19

slide-15
SLIDE 15

. . . . . .

Resource Requests

. . . . . . . Cluster ID, number of nodes, duration RMS chooses start time → node IDs are allocated to the application Type

◮ Non-preemptible (default in major RMSs) ◮ Preemptible (think OAR best-effort jobs) ◮ Pre-allocation

“I do not currently need these resources, but make sure I can get them immediately if I need them.”

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 8 / 19

slide-16
SLIDE 16

. . . . . .

High-level Operations

.

Low-level Operations

. . . . . . . . CooRMv2 defines simple, low-level operations on requests .

High-level Operations

. . . . . . . . An update is guaranteed to succeed only inside a pre-allocation

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 9 / 19

slide-17
SLIDE 17

. . . . . .

Views

. . . . . . . Apps need to adapt their requests to the availability of the resources Each app is presented with two views: non-preemptible, preemptible Preemptible view informs when resources need to be preempted

2 4 6 8 10 12 14 20 40 60 80 100 120 140 Number of nodes Time (minutes) Preemptible view Non-preemptible view

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 10 / 19

slide-18
SLIDE 18

. . . . . .

Scheduling Algorithm

. . . . . . . Pre-allocations and non-preemptible requests

◮ Conservative Back-Filling (CBF)

Preemptible requests

◮ equi-partitioning Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 11 / 19

slide-19
SLIDE 19

. . . . . .

. ..

1

Introduction . ..

2

CooRMv2 Resource Requests High-level Operations Views Scheduling Algorithm . ..

3

Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . ..

4

Results . ..

5

Conclusions

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 12 / 19

slide-20
SLIDE 20

. . . . . .

Non-predictably Evolving: Adaptive Mesh Refinement

.

Application Model

. . . . . . . . Application knows its speed-up model Cannot predict its data evolution Aim: maintain a given target efficiency .

Behaviour in CooRMv2

. . . . . . . . Sends one pre-allocation

◮ Simulation parameter: overcommitFactor

Sends non-preemptible requests inside the pre-allocation

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 13 / 19

slide-21
SLIDE 21

. . . . . .

Malleable: Parameter-Sweep Application

.

Application Model

. . . . . . . . Infinite number of single-node tasks All tasks have the same duration (known in advance) Aim: maximize speed-up .

Behaviour in CooRMv2

. . . . . . . . Send preemptible requests Spawn tasks if resources are available Kill tasks if RMS asks to (increases waste) Stop tasks if will not be available (no waste)

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 14 / 19

slide-22
SLIDE 22

. . . . . .

. ..

1

Introduction . ..

2

CooRMv2 Resource Requests High-level Operations Views Scheduling Algorithm . ..

3

Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . ..

4

Results . ..

5

Conclusions

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 15 / 19

slide-23
SLIDE 23

. . . . . .

Scheduling with Spontaneous Updates

.

Experimental Setup

. . . . . . . . Apps: 1xAMR (target eff. = 75%), 1xPSA (task duration = 600 s) Resources: number of nodes just enough to fit the AMR AMR uses fixed / dynamic allocations . . . . . . .

0.0 10.0M 20.0M 30.0M 40.0M 50.0M 60.0M 70.0M 0.1 1 10 AMR (nodes x seconds) AMR overcommit factor Fixed Dynamic 100k 200k 300k 400k 500k 600k 0.1 1 10 PSA waste (nodes x seconds) AMR overcommit factor Dynamic

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 16 / 19

slide-24
SLIDE 24

. . . . . .

Scheduling with Announced Updates

.

Experimental Setup

. . . . . . . . Apps: 1xAMR (target eff. = 75%), 1xPSA (task duration = 600 s) Resources: number of nodes just enough to fit the AMR AMR uses announced updates (announce interval) . . . . . . .

  • 5

5 10 15 20 25 30 100 200 300 400 500 600 700 AMR end-time (%) AMR announce interval (s) 5 10 15 20 25 30 100 200 300 400 500 600 PSA waste (%) AMR announce interval (s)

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 17 / 19

slide-25
SLIDE 25

. . . . . .

. ..

1

Introduction . ..

2

CooRMv2 Resource Requests High-level Operations Views Scheduling Algorithm . ..

3

Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . ..

4

Results . ..

5

Conclusions

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 18 / 19

slide-26
SLIDE 26

. . . . . .

Conclusions

.

CooRMv2

. . . . . . . . A centralized RMS which supports

◮ Evolving apps ◮ Malleable apps

Can be used to manage federation of clusters .

Perspectives

. . . . . . . . What economic model?

◮ Charge for unused pre-allocated resources? ◮ Charge for frequency / size of updates? ◮ Charge for quality / timeliness of updates?

Non-homogeneous networks (e.g., torus topology)?

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 19 / 19

slide-27
SLIDE 27

. . . . . .

Backup Slides

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 20 / 29

slide-28
SLIDE 28

. . . . . .

AMR Evolution

.

AMR Examples

. . . . . . . . .

AMR Model

. . . . . . . .

100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Normalized data size Step number 1 10 100 1 4 16 64 256 1k 4k 16k Duration of a step (s) Number of nodes

3136 GiB 784 GiB 196 GiB 48 GiB 12 GiB

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 21 / 29

slide-29
SLIDE 29

. . . . . .

Principles — Request Relations

.

Request Relations

. . . . . . . . dynamic applications → multiple requests + temporal constraints between requests relatedTo an existing request relatedHow FREE, NEXT, COALLOC request(), done() .

High-level Operations

. . . . . . . .

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 22 / 29

slide-30
SLIDE 30

. . . . . .

Principles — Request Relations

.

Request Relations

. . . . . . . . dynamic applications → multiple requests + temporal constraints between requests relatedTo an existing request relatedHow FREE, NEXT, COALLOC request(), done() .

High-level Operations

. . . . . . . .

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 22 / 29

slide-31
SLIDE 31

. . . . . .

Architecture

.

CooRM

. . . . . . . . .

CooRMv2

. . . . . . . .

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 23 / 29

slide-32
SLIDE 32

. . . . . .

Interaction

. . . . . . .

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 24 / 29

slide-33
SLIDE 33

. . . . . .

RMS Implementation

.

Main Responsibilities

. . . . . . . . Compute views Compute start times for each requests Start requests and allocate resources .

Main Idea of the Scheduling Algorithm

. . . . . . . . Applications are ordered according to arrival time Pre-allocated resources cannot be pre-allocated by next applications Preemptible resources are shared equally

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 25 / 29

slide-34
SLIDE 34

. . . . . .

AMR Pre-announcements

.

Experimental Setup

. . . . . . . . launched at t = 0: 1xAMR application, 1xPSA application PSA: task duration = 600 s AMR: “pre-announces” changes (pre-announce interval)

◮ Done either to be nice to other apps ◮ Basically, the AMR application makes an UPDATE every interval Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 26 / 29

slide-35
SLIDE 35

. . . . . .

AMR Pre-announcements (cont.)

.

Pros

. . . . . . . .

5 10 15 20 25 30 100 200 300 400 500 600 PSA waste (%) AMR announce interval (s) 50 100 150 200 250 300 100 200 300 400 500 600 Number of Reschedules AMR Preannounce Interval (s) 652

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 27 / 29

slide-36
SLIDE 36

. . . . . .

AMR Pre-announcements (cont.)

.

Cons

. . . . . . . .

  • 5

5 10 15 20 25 30 100 200 300 400 500 600 700 AMR end-time (%) AMR announce interval (s) 96.5 97 97.5 98 98.5 99 99.5 100 100 200 300 400 500 600 700 Used Resources (%) AMR Preannounce Interval (s)

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 28 / 29

slide-37
SLIDE 37

. . . . . .

Nice Resource “Filling”

.

Experimental Setup

. . . . . . . . launched at t = 0: 1xAMR application, 2xPSA application PSA1: task duration = 600 s, PSA2: task duration = 60 s . . . . . . .

97.5 98 98.5 99 99.5 100 100 200 300 400 500 600 700 Used resources (%) AMR pre-announce interval

1xPSA 2xPSA 2xPSA (strict equi-partitioning)

Cristian KLEIN (INRIA) CooRMv2 Scheduling in Aussois 29 / 29