*Viktor Yarmolenko Rizos Sakellariou School of Computer Science - - PowerPoint PPT Presentation

viktor yarmolenko rizos sakellariou
SMART_READER_LITE
LIVE PREVIEW

*Viktor Yarmolenko Rizos Sakellariou School of Computer Science - - PowerPoint PPT Presentation

3rd International Workshop on Middleware for Grid Computing 28-29 November 2005 Grenoble France *Viktor Yarmolenko Rizos Sakellariou School of Computer Science The University of Manchester Manchester, UK *corresponding author:


slide-1
SLIDE 1

*Viktor Yarmolenko Rizos Sakellariou

School of Computer Science The University of Manchester Manchester, UK

*corresponding author: Viktor.Yarmolenmko@manchester.ac.uk

3rd International Workshop on Middleware for Grid Computing 28-29 November 2005 Grenoble France

slide-2
SLIDE 2

Introduction & What is Coming

  • WS-A Terms: Service Level Objectives & Business Value List
  • What are the usual terms for job submission?
  • Why WS-Agreement needs extending?
  • How do we wan it to be extended?
  • Simple scenarios to demonstrate extended WS-Agreement at work
  • Simulation model used to prove the point
  • What do the results say?

Service Level Agreement (SLA) is nothing more than a contract between two or more parties WS-Agreement is one of the implementations of SLA

slide-3
SLIDE 3

The Usual Suspects – SLO&BVL NCPU – number of CPU nodes required for the Job tD – projected Job duration time for NCPU nodes tUP – uniprocessor Job duration time (CPU-hours)

SLO: SLO: SLO: SLO: SLO:

TS – the earliest time the Job is allowed to start TF – the latest time the Job is allowed to finish

SLO:

Bjob – projected traffic that Job creates

BVL:

Vpr – the price for executing the Job

BVL:

Vpn – the penalty for failing the Job

BVL:

Vtot – final value of the agreement (optional) time

tD NCPU

TS TF

slide-4
SLIDE 4

More Flexibility!!!

A list of universal variables A list of predefined common functions Possibility to describe agreement terms as functions

x y z h Δ π α β

slide-5
SLIDE 5

Universal Terms – Useful Variables & Functions

UT: BRES(t) – Resource bandwidth: nominal or @ time UT: tcurr – current wall clock time UT: d(n)= n+(n-1)+…+2+1 – triangular numbers UT: Rld(tcurr) – Resource load @ time: current or any other UT: tS – actual Job execution start time UT: tDA – actual Job duration time

BJA(tS , tDA) – actual bandwidth used by the Job

UT: UT: fnorm(t,low,high) – binary function UT: ftr(t ,low,a ,high ,ß) – trapezium

slide-6
SLIDE 6

CPU

Time

NCPU = {2,3,4,..} tD = NCPU tUP

NCPU = 12 tD = 2 NCPU = 8 tD = 3 NCPU = 7 tD = 3.43 NCPU = 6 tD = 4

NCPU = 4 tD = 6

NCPU = 3 tD = 8 NCPU = 2 tD = 12

tUP = 24

SLO: SLO: SLO: Xother = const SLO:

Variable Number of CPUs per Job

slide-7
SLIDE 7

NCPU = {2,3,4,..} tUNIPROC = 24

SLO:

SLO: Xother = const

SLO:

UT: BRES(tcurr) UT: tcurr SLO: Bjob = B0 d(NCPU – 1)

tD = BRES NCPU Bjob tUP

SLO: UT: d(n)= n+(n-1)+…+2+1

2BRES B0 tUP (NCPU – 1) =

For All-to-All topology

Adding Variable Bandwidth and Traffic

CPU#1 CPU#2 CPU#3 CPU#4 CPU#5 CPU#6

CPU

Time

NCPU = 12 tD= 2 NCPU = 8 tD= 3 NCPU = 7 tD= 3.43 NCPU = 6 tD= 4

NCPU = 4 tD= 6

NCPU = 3 tD = 8 NCPU = 2 tD = 12

slide-8
SLIDE 8

NCPU = {2,3,4,..} tUNIPROC = 24

SLO:

SLO: Xother = const

SLO:

UT: BRES(tcurr) UT: tcurr SLO: Bjob = B0 (NCPU – 1)

tD = BRES NCPU Bjob tUP

SLO:

NCPU BRES B0 tUP (NCPU – 1) =

CPU#1 CPU#2 CPU#3 CPU#4 CPU#5 CPU#6

For Pipe topology

Adding Variable Bandwidth and Traffic

CPU

Time

NCPU = 12 tD= 2 NCPU = 8 tD= 3 NCPU = 7 tD= 3.43 NCPU = 6 tD= 4

NCPU = 4 tD= 6

NCPU = 3 tD = 8 NCPU = 2 tD = 12

slide-9
SLIDE 9

CPU#1 CPU#2 CPU#3 CPU#4 CPU#5 CPU#6 CPU#1 CPU#2 CPU#3 CPU#4 CPU#5 CPU#6

Comparing the Impact of Two Topologies

1 2 3 4 5 6 0.0 0.5 1.0 1.5 2.0 2.5

Pipe Topology

Dependence on NCPU Duration of the Job, ~tD

All-to-All Topology

slide-10
SLIDE 10

1

ftr

(d) (c) (b) (a) Time, t Building Vtot function

Vpn

max

Vpr

max

' ftr ' ftr fld

(ts+t

D )

ts

fld

SLO: Xother = const UT: BRES(tcurr) UT: tcurr SLO: Bjob

tD = BRES NCPU Bjob tUP

SLO: UT: Rld(tcurr) = fld BVL: Vtot = f(Rld , ts, NCPU , …)

Defining the Value of the Service

slide-11
SLIDE 11

Suddenly life becomes more interesting

slide-12
SLIDE 12

The Model Set of 340 Job requests, for which a solution exists where the 100% utilisation is possible on Resource (147 hours x 64 CPUs)

User Resource

Capacity of 64 CPUs and available for 147 hours Scheduling by the earliest deadline first (single iteration)

tD × NCPU = A; ‹A› = 21.85

Single & Multiple Negotiations

slide-13
SLIDE 13

Variable CPU Scenario (Original vs. Extended SLA)

User Resource How about: NCPU=6; tD=4; … No can do  Then how about: NCPU=4; tD=6; … No can do  Then how about: NCPU=2; tD=12; … Will do  User Resource How about: tD= f(NCPU); … Will do  Time t = 0 t → ∞ t = 0 t ‚ ∞

slide-14
SLIDE 14

Only Single Negotiation is Allowed

84 86 88 90 92 94 96 98 100 10 20 30 40 50 60 70 80 90 100

Rate of Rejected Jobs: The Persentage of Rejected Jobs, % The Persentage of Processed Jobs, % using normal SLA using extended SLA

slide-15
SLIDE 15

Multiple Negotiations Allowed

84 86 88 90 92 94 96 98 100 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

using normal SLA using extended SLA The Average Number of Negotiations per Job The Persentage of Processed Jobs, %

slide-16
SLIDE 16

Was it all worth it?

  • Reduction in traffic associated with negotiation of Resource
  • Reduction in user-service interaction
  • Extended Agreement gives more power to resource

allocation, scheduling, management, aggregation of services

  • Extended Agreement is extensible and could support future

demands, e.g. new optimisation algorithms, value added services, autonomous services, …