Day 3 Agenda for Today Formulate simple problem statement - - PowerPoint PPT Presentation

β–Ά
day 3 agenda for today
SMART_READER_LITE
LIVE PREVIEW

Day 3 Agenda for Today Formulate simple problem statement - - PowerPoint PPT Presentation

Day 3 Agenda for Today Formulate simple problem statement Revisit the workload characterization problem. Present detailed (step by step) derivation of the workload. Workload Modeling Workload modeling is used to generate


slide-1
SLIDE 1

Day 3

slide-2
SLIDE 2

Agenda for Today

Formulate simple problem statement Revisit the workload characterization problem. Present detailed (step by step) derivation of the workload.

slide-3
SLIDE 3

Workload Modeling

  • Workload modeling is used to generate synthetic

workloads based on real-life job execution observations.

  • The goal is typically to be able to create workloads that

can be used in performance evaluation studies

Workload Submission

Workload description (may include QoS)

Workload Generation

J1 J1 Jn

Analysis SLA

Consolidate result

Negotiation

slide-4
SLIDE 4

Problem Formulation

  • Given 𝑄 number of available processors and m jobs,

𝒦 = 𝐾1, 𝐾2, β‹― , 𝐾𝑛 , waiting in a queue to be processed

  • Problem - allocate π‘žπ‘˜ processors to job 𝐾𝑗 such that the overall

execution time is minimized

π‘›π‘—π‘œπ‘›π‘—π‘‘π‘“ π‘ˆ

π‘˜ π‘žπ‘˜ , 𝑛 π‘˜=1

Subject to π‘žπ‘˜ ≀ 𝑄, π‘žπ‘˜βˆˆ *1,2, … , π‘žπ‘›π‘π‘¦+

𝑛 π‘˜=1

execution will starts only when π‘ž = π‘žπ‘›π‘π‘¦ allocated – π‘ˆ

π‘˜ π‘œ be the execution time function of job j,

– π‘žπ‘›π‘π‘¦ the maximum parallelism job j can have, – π‘žπ‘˜ is the unknown processor allocation to job j, and

slide-5
SLIDE 5

Problem Formulation

  • Assume that we have four jobs 𝒦 = J1, J2, J3, J4 .
  • For simplicity assume that each job request 3 processors

and the service demand of each job is 20 time units.

  • Suppose we have 𝑄 = 7 available homogenous processors to

be assigned to the 4 jobs.

  • The assignment of the processor to the jobs must minimize

the overall completion time of the jobs.

  • Assume only space sharing allocation
slide-6
SLIDE 6

Allocation Possibility

  • One possible assignment is
  • J1 = 3,
  • J2 = 3
  • J3 = 1
  • J4 = 0

20 Time J1, J2 com completed sta tarted J1, J2 J3, J4 40 J3, J4 Initialization say 3 time units

  • The total execution time is 43 units.
  • Can we do better?
slide-7
SLIDE 7

Job Description

  • We have a set of jobs to be executed

𝒦 = 𝐾1, 𝐾2, β‹― , πΎπ‘œ

  • Number of tasks per job

– Each job 𝐾𝑗 has a set of tasks

𝐾𝑗 = π‘ˆ

1, π‘ˆ2, β‹― , π‘ˆ 𝑛

– Note that a task represents a part of the work that must be done serially by a single processor – Interdependence among job tasks is important to consider

  • A job is said to be β€œsmall” (or β€œlarge”) if it consists of

a small (or large) number of tasks.

  • Tasks with a large service demand may introduce

large queuing delays for queued jobs.

slide-8
SLIDE 8

Job Description

  • A job is said to be β€œsmall” (or β€œlarge”) if it consists of a

small (or large) number of tasks.

  • Based on he analysis of real workload logs used in

production

– the percentage of small jobs, with a small number of tasks, is higher than large jobs, with a large number of tasks. – For this reason, we examine the following distribution for the number of tasks per job.

  • Tasks with a large service demand may introduce large

queuing delays for queued jobs.

Large jobs

Small jobs

slide-9
SLIDE 9

Job Description

  • A job is completely described by the following

parameters:

– Cumulative job service demand (W) – The arrival time – The number of task

  • A job with one task is called a sequential job

and a job with multiple tasks is called parallel jobs.

slide-10
SLIDE 10

Workload Modeling

  • We will consider the following

workloads

– WK1 consists of curves with relatively good speedup. – WK2 consists of curves with not as good speedup as W1. – WK3 consists of curves with poor speedup. – WK4 contains jobs with all three speedup types, each appearing with approximately equal frequency.

slide-11
SLIDE 11

Workload Generation

  • Sevcik proposed the following model to represent

the execution time function of a job that can run on π‘ž processors:

π‘ˆ π‘ž = 𝜚(π‘ž) 𝑋 π‘ž + 𝛽 + 𝛾 βˆ™ π‘ž

  • It has been shown that a wide range of

representative applications can be modeled by utilizing the above execution time function.

slide-12
SLIDE 12

Workload Generation

  • The execution time function captures both the scale

up and the overhead

π‘ˆ π‘ž = 𝜚(π‘ž) 𝑋 π‘ž

π‘‘π‘‘π‘π‘šπ‘“π‘£π‘ž

+ 𝛽 + 𝛾 βˆ™ π‘ž

π‘π‘€π‘“π‘ β„Žπ‘“π‘π‘’

  • The runtime function allows to create different

workloads by choosing different values for the parameters: Ο†, W, Ξ², and Ξ±.

slide-13
SLIDE 13

Workload Generation

  • There is certain level of load imbalance when

running a job on multiple processors.

  • 𝜚(π‘ž) parameter in the equation

π‘ˆ π‘ž = 𝜚(π‘ž) 𝑋 π‘ž + 𝛽 + 𝛾 βˆ™ π‘ž – 𝜚(π‘ž) represents the degree to which the work is not evenly spread across the p processors (i.e., load imbalance) – Real measurements conducted by Wu shows that its value is in the range of: 1.1 ≀ 𝜚(π‘ž) ≀ 1.2 – Therefore, Ο†(p) can be considered equal to 1.0.

slide-14
SLIDE 14

Workload Generation

Note that adding processors to a job reduces computation time but there is certain level of increases in completion time due to sequential execution

  • 𝛽 parameters in the equation captures the above

π‘ˆ π‘ž = 𝜚(π‘ž) 𝑋 π‘ž + 𝛽 + 𝛾 βˆ™ π‘ž – 𝛽 represents the increase of the work per processor due to parallelization (i.e., overhead)

slide-15
SLIDE 15

Workload Generation

  • Note that adding processors to a job reduces

computation time but increases communication time

  • 𝛾 parameter in the equation captures the above

issue

π‘ˆ π‘ž = 𝜚(π‘ž) 𝑋 π‘ž + 𝛽 + 𝛾 βˆ™ π‘ž – 𝛾 represents the communication and congestion delays that increase with the increase in the number of processors assigned to job. – What this says is that the more the number of processors given to a job the higher the communication cost and congestion delays

slide-16
SLIDE 16

Workload Generation

  • 𝑋 in the runtime function represents the total

service demand of the job

π‘ˆ π‘ž = 𝜚(π‘ž) 𝑋 π‘ž + 𝛽 + 𝛾 βˆ™ π‘ž – The mean value 𝑋 is 13.76 and the coefficient of variation must be greater than one (e.g., 3.5, 10.0).

Large jobs

Small jobs

average service demand = 1.3 and account for 7/8

  • f the jobs in the system

average service demand = 101 and accounts for 1/8 jobs in the system

slide-17
SLIDE 17

Workload Speed Up

  • We can determine the speed up of the application
  • n p processors as follows:

𝑇 π‘ž = 1 + 1 π‘žπ‘›π‘π‘¦ 2 + 1 π‘žπ‘›π‘π‘¦ 2

𝜈

1 π‘ž + π‘ž π‘žπ‘›π‘π‘¦ 2 + 1 π‘žπ‘›π‘π‘¦ 2

𝜈

  • Where

– 𝜈 ∈ ∞, 0.2,0.4 – π‘žπ‘›π‘π‘¦ is the maximum number of processors assigned to the workload (WK1, WK2, WK3).

slide-18
SLIDE 18

Workload Speed Up

  • We can determine the speed up for WK3 with

𝜈 = 0.2 and π‘žπ‘›π‘π‘¦=1,4,6,9, p=32

𝑇 π‘ž = 1 + 1 π‘žπ‘›π‘π‘¦ 2 + 1 π‘žπ‘›π‘π‘¦ 2

𝜈

1 π‘ž + π‘ž π‘žπ‘›π‘π‘¦ 2 + 1 π‘žπ‘›π‘π‘¦ 2

𝜈

  • We can substitutive the above

𝑇 π‘ž = 1 + 1 4 2 + 1 4 2

𝜈

1 32 + 32 4 2 + 1 4 2

𝜈

slide-19
SLIDE 19

Workload Speed Up

  • The results for the speedup curve π‘žπ‘›π‘π‘¦ = 4
slide-20
SLIDE 20

Workload Speed Up

  • The results for the speedup curve when

π‘žπ‘›π‘π‘¦ = 16

slide-21
SLIDE 21

Workload Speed Up

  • The results for the speedup curve when

π‘žπ‘›π‘π‘¦ = 64

slide-22
SLIDE 22

Workload WK1

  • Consists of curves with relatively

good speedup.

  • They correspond to 𝜈 = +∞
  • Example is matrix multiplication

application

  • The number of possible tasks is

given by π‘œ = 𝑋 𝛾

  • Ξ² reflects the communication

and congestion delays that increase with the number of processors.

Task 1 Task 2 Task 3 Task n-1 Structure of matrix multiplication application Task n

slide-23
SLIDE 23

Workload WK1

  • The actual job service demand value is obtained

from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time.

𝑔 π‘₯ = 𝑄 1 βˆ’ 𝑓

𝑋 101 π‘šπ‘π‘ π‘•π‘“ 𝑝𝑐𝑑

+ 1 βˆ’ 𝑄 1 βˆ’ 𝑓

𝑋 1.3 π‘‘π‘›π‘π‘šπ‘š π‘˜π‘π‘π‘‘

slide-24
SLIDE 24

Workload WK1

  • The actual job service demand value is obtained

from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time.

𝑔 π‘₯ = 𝑄 1 βˆ’ 𝑓

𝑋 101 + 1 βˆ’ 𝑄

1 βˆ’ 𝑓

𝑋 101

Where – 𝑄 = 0125 – The mean value of 𝑋 is 13.76

slide-25
SLIDE 25

Workload WK1

  • The service demand of WK1 can now be computed as

𝑔 π‘₯ = 0.125 1 βˆ’ 𝑓 βˆ’13.76

101

+ 1 βˆ’ 0.125 1 βˆ’ 𝑓

βˆ’13.76 1.3

slide-26
SLIDE 26

Workload WK2

  • They correspond to 𝜈 = 0.4,

which indicate that its speedup is not as good as WK1

  • Example: n-body simulations of

stellar or planetary movements, in which the movement of each body is governed by the gravitational forces produced by the system as a whole

  • The number of possible tasks is

given by

π‘œ = 𝑋 𝛾

slide-27
SLIDE 27

Workload WK3

  • They correspond to 𝜈 = 0.2,

which indicate that its speedup is poorer than WK1 and WK2

  • Example: Mean value analysis
  • kinds of applications exhibit this

property.

  • The number of possible tasks is

given by

π‘œ = 𝑋 𝛾

slide-28
SLIDE 28

Workload WK3

  • They correspond to 𝜈 = 0.2,

which indicate that its speedup is poorer than WK1 and WK2

  • Example: Mean value analysis
  • kinds of applications exhibit this

property.

  • The number of possible tasks is

given by

π‘œ = 𝑋 𝛾

slide-29
SLIDE 29

The Arrival Process

  • Jobs arrive to the system in stochastic manner

J1 J2 J3 J4

  • An exponential distribution with the mean

inter-arrival time can be derived as follows:

𝑔 𝑦 = πœ‡ βˆ™ π‘“βˆ’πœ‡βˆ™π‘¦

  • where

1 πœ‡ = 𝐹 π‘ˆ 1 𝑂 βˆ— 𝑀𝑝𝑏𝑒

slide-30
SLIDE 30

The Arrival Process

  • We already know that E(W) = 13.76.

𝐹 π‘ˆ 1 = 𝐹 𝑋 + 𝐹 𝛾 + 𝐹 𝛽

  • By using the theorem of total expectation across the

three different values of π‘žπ‘›π‘π‘¦, we find E(𝛾) = 0.30

𝐹 𝛽 = 0.0 𝑗𝑔 𝜈 = +∞ 2.0 𝑗𝑔 𝜈 = 0.4 5.0 𝑗𝑔 𝜈 = 0.2 2.3 mixed speedup

slide-31
SLIDE 31

Today Lab

  • We have seen the cloudlet in CloudSim. It just

runs with fixed values. We want to change it.

  • Task 1: change the constant value to make the

arrival process based on exponential distributions

  • Task 2: Currently the execution time of the

cloudlet is fixed. Change it to be generated using 2-stage hyper-exponential distribution.

slide-32
SLIDE 32

32

Thank you.

Questions, Comments, …?

slide-33
SLIDE 33

Exponential Distribution

  • The Exponential:

– πœ‡ = measures how many things happen per unit 𝑔 𝑦 = πœ‡ βˆ™ π‘“βˆ’πœ‡βˆ™π‘¦

0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 probability density functions X

Lambda

0.5 1 2

slide-34
SLIDE 34

Hyper-exponential distribution

  • Hyper-exponential distributions is used in

service demand modeling

  • The hyper-exponential distribution is obtained

by selecting from a mixture of several exponential distributions.

  • The simplest variant has only two stages:

πœ‡1 πœ‡2 𝑄 1 βˆ’ 𝑄 πœ‡1 β‰  πœ‡2

𝑔 𝑦 = π‘žπ‘— βˆ™ πœ‡π‘— βˆ™ π‘“πœ‡π‘—π‘¦

π‘œ 𝑗=1

0 ≀ π‘žπ‘— ≀ 1 0 ≀ πœ‡1, πœ‡2𝑗 ≀ 1

slide-35
SLIDE 35

Coefficient of Variation

  • Coefficient of variation is defined as follows

𝐷𝑀 = 𝜏 𝜈

  • Where

– 𝜏 is the standard deviation – 𝜈 is the mean

slide-36
SLIDE 36

Long-lived vs short-lived Jobs

  • Long-lived processes
  • short-lived processes

𝑄 π‘ˆ > 𝜐 ∝ 𝛽2, 𝛽 β‰ˆ 1

  • This means that most processes are short, but

a small number are very long.

slide-37
SLIDE 37

Coefficient of Variation

  • Coefficient of variation is defined as follows

𝐷𝑀 = 𝜏 𝜈

  • Where

– 𝜏 is the standard deviation – 𝜈 is the mean

slide-38
SLIDE 38

Performance Measurements

  • Classical performance metrics:

– Response time, – throughput, – scalability, – resource/cost/energy, – efficiency, – Elasticity – Availability, – reliability, and – security – SLA violation

slide-39
SLIDE 39

References

  • Thyagaraj Thanalapati, Sivarama P. Dandamudi:

An Efficient Adaptive Scheduling Scheme for Distributed Memory Multicomputers. IEEE Trans. Parallel Distrib. Syst. 12(7): 758-768 (2001)

  • A. Iosup and D.H.J. Epema, Grid Computing

Workloads, IEEE Internet Computing 15(2): 19-26 (2011)

  • Feitelson DG. Workload modeling for computer

systems performance evaluation. Cambridge University Press; 2015.