Day 3 Agenda for Today Formulate simple problem statement - - PowerPoint PPT Presentation
Day 3 Agenda for Today Formulate simple problem statement - - PowerPoint PPT Presentation
Day 3 Agenda for Today Formulate simple problem statement Revisit the workload characterization problem. Present detailed (step by step) derivation of the workload. Workload Modeling Workload modeling is used to generate
Agenda for Today
Formulate simple problem statement Revisit the workload characterization problem. Present detailed (step by step) derivation of the workload.
Workload Modeling
- Workload modeling is used to generate synthetic
workloads based on real-life job execution observations.
- The goal is typically to be able to create workloads that
can be used in performance evaluation studies
Workload Submission
Workload description (may include QoS)
Workload Generation
J1 J1 Jn
Analysis SLA
Consolidate result
Negotiation
Problem Formulation
- Given π number of available processors and m jobs,
π¦ = πΎ1, πΎ2, β― , πΎπ , waiting in a queue to be processed
- Problem - allocate ππ processors to job πΎπ such that the overall
execution time is minimized
ππππππ‘π π
π ππ , π π=1
Subject to ππ β€ π, ππβ *1,2, β¦ , ππππ¦+
π π=1
execution will starts only when π = ππππ¦ allocated β π
π π be the execution time function of job j,
β ππππ¦ the maximum parallelism job j can have, β ππ is the unknown processor allocation to job j, and
Problem Formulation
- Assume that we have four jobs π¦ = J1, J2, J3, J4 .
- For simplicity assume that each job request 3 processors
and the service demand of each job is 20 time units.
- Suppose we have π = 7 available homogenous processors to
be assigned to the 4 jobs.
- The assignment of the processor to the jobs must minimize
the overall completion time of the jobs.
- Assume only space sharing allocation
Allocation Possibility
- One possible assignment is
- J1 = 3,
- J2 = 3
- J3 = 1
- J4 = 0
20 Time J1, J2 com completed sta tarted J1, J2 J3, J4 40 J3, J4 Initialization say 3 time units
- The total execution time is 43 units.
- Can we do better?
Job Description
- We have a set of jobs to be executed
π¦ = πΎ1, πΎ2, β― , πΎπ
- Number of tasks per job
β Each job πΎπ has a set of tasks
πΎπ = π
1, π2, β― , π π
β Note that a task represents a part of the work that must be done serially by a single processor β Interdependence among job tasks is important to consider
- A job is said to be βsmallβ (or βlargeβ) if it consists of
a small (or large) number of tasks.
- Tasks with a large service demand may introduce
large queuing delays for queued jobs.
Job Description
- A job is said to be βsmallβ (or βlargeβ) if it consists of a
small (or large) number of tasks.
- Based on he analysis of real workload logs used in
production
β the percentage of small jobs, with a small number of tasks, is higher than large jobs, with a large number of tasks. β For this reason, we examine the following distribution for the number of tasks per job.
- Tasks with a large service demand may introduce large
queuing delays for queued jobs.
Large jobs
Small jobs
Job Description
- A job is completely described by the following
parameters:
β Cumulative job service demand (W) β The arrival time β The number of task
- A job with one task is called a sequential job
and a job with multiple tasks is called parallel jobs.
Workload Modeling
- We will consider the following
workloads
β WK1 consists of curves with relatively good speedup. β WK2 consists of curves with not as good speedup as W1. β WK3 consists of curves with poor speedup. β WK4 contains jobs with all three speedup types, each appearing with approximately equal frequency.
Workload Generation
- Sevcik proposed the following model to represent
the execution time function of a job that can run on π processors:
π π = π(π) π π + π½ + πΎ β π
- It has been shown that a wide range of
representative applications can be modeled by utilizing the above execution time function.
Workload Generation
- The execution time function captures both the scale
up and the overhead
π π = π(π) π π
π‘πππππ£π
+ π½ + πΎ β π
ππ€ππ βπππ
- The runtime function allows to create different
workloads by choosing different values for the parameters: Ο, W, Ξ², and Ξ±.
Workload Generation
- There is certain level of load imbalance when
running a job on multiple processors.
- π(π) parameter in the equation
π π = π(π) π π + π½ + πΎ β π β π(π) represents the degree to which the work is not evenly spread across the p processors (i.e., load imbalance) β Real measurements conducted by Wu shows that its value is in the range of: 1.1 β€ π(π) β€ 1.2 β Therefore, Ο(p) can be considered equal to 1.0.
Workload Generation
Note that adding processors to a job reduces computation time but there is certain level of increases in completion time due to sequential execution
- π½ parameters in the equation captures the above
π π = π(π) π π + π½ + πΎ β π β π½ represents the increase of the work per processor due to parallelization (i.e., overhead)
Workload Generation
- Note that adding processors to a job reduces
computation time but increases communication time
- πΎ parameter in the equation captures the above
issue
π π = π(π) π π + π½ + πΎ β π β πΎ represents the communication and congestion delays that increase with the increase in the number of processors assigned to job. β What this says is that the more the number of processors given to a job the higher the communication cost and congestion delays
Workload Generation
- π in the runtime function represents the total
service demand of the job
π π = π(π) π π + π½ + πΎ β π β The mean value π is 13.76 and the coefficient of variation must be greater than one (e.g., 3.5, 10.0).
Large jobs
Small jobs
average service demand = 1.3 and account for 7/8
- f the jobs in the system
average service demand = 101 and accounts for 1/8 jobs in the system
Workload Speed Up
- We can determine the speed up of the application
- n p processors as follows:
π π = 1 + 1 ππππ¦ 2 + 1 ππππ¦ 2
π
1 π + π ππππ¦ 2 + 1 ππππ¦ 2
π
- Where
β π β β, 0.2,0.4 β ππππ¦ is the maximum number of processors assigned to the workload (WK1, WK2, WK3).
Workload Speed Up
- We can determine the speed up for WK3 with
π = 0.2 and ππππ¦=1,4,6,9, p=32
π π = 1 + 1 ππππ¦ 2 + 1 ππππ¦ 2
π
1 π + π ππππ¦ 2 + 1 ππππ¦ 2
π
- We can substitutive the above
π π = 1 + 1 4 2 + 1 4 2
π
1 32 + 32 4 2 + 1 4 2
π
Workload Speed Up
- The results for the speedup curve ππππ¦ = 4
Workload Speed Up
- The results for the speedup curve when
ππππ¦ = 16
Workload Speed Up
- The results for the speedup curve when
ππππ¦ = 64
Workload WK1
- Consists of curves with relatively
good speedup.
- They correspond to π = +β
- Example is matrix multiplication
application
- The number of possible tasks is
given by π = π πΎ
- Ξ² reflects the communication
and congestion delays that increase with the number of processors.
Task 1 Task 2 Task 3 Task n-1 Structure of matrix multiplication application Task n
Workload WK1
- The actual job service demand value is obtained
from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time.
π π₯ = π 1 β π
π 101 πππ ππ πππ‘
+ 1 β π 1 β π
π 1.3 π‘ππππ ππππ‘
Workload WK1
- The actual job service demand value is obtained
from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time.
π π₯ = π 1 β π
π 101 + 1 β π
1 β π
π 101
Where β π = 0125 β The mean value of π is 13.76
Workload WK1
- The service demand of WK1 can now be computed as
π π₯ = 0.125 1 β π β13.76
101
+ 1 β 0.125 1 β π
β13.76 1.3
Workload WK2
- They correspond to π = 0.4,
which indicate that its speedup is not as good as WK1
- Example: n-body simulations of
stellar or planetary movements, in which the movement of each body is governed by the gravitational forces produced by the system as a whole
- The number of possible tasks is
given by
π = π πΎ
Workload WK3
- They correspond to π = 0.2,
which indicate that its speedup is poorer than WK1 and WK2
- Example: Mean value analysis
- kinds of applications exhibit this
property.
- The number of possible tasks is
given by
π = π πΎ
Workload WK3
- They correspond to π = 0.2,
which indicate that its speedup is poorer than WK1 and WK2
- Example: Mean value analysis
- kinds of applications exhibit this
property.
- The number of possible tasks is
given by
π = π πΎ
The Arrival Process
- Jobs arrive to the system in stochastic manner
J1 J2 J3 J4
- An exponential distribution with the mean
inter-arrival time can be derived as follows:
π π¦ = π β πβπβπ¦
- where
1 π = πΉ π 1 π β ππππ
The Arrival Process
- We already know that E(W) = 13.76.
πΉ π 1 = πΉ π + πΉ πΎ + πΉ π½
- By using the theorem of total expectation across the
three different values of ππππ¦, we find E(πΎ) = 0.30
πΉ π½ = 0.0 ππ π = +β 2.0 ππ π = 0.4 5.0 ππ π = 0.2 2.3 mixed speedup
Today Lab
- We have seen the cloudlet in CloudSim. It just
runs with fixed values. We want to change it.
- Task 1: change the constant value to make the
arrival process based on exponential distributions
- Task 2: Currently the execution time of the
cloudlet is fixed. Change it to be generated using 2-stage hyper-exponential distribution.
32
Thank you.
Questions, Comments, β¦?
Exponential Distribution
- The Exponential:
β π = measures how many things happen per unit π π¦ = π β πβπβπ¦
0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 probability density functions X
Lambda
0.5 1 2
Hyper-exponential distribution
- Hyper-exponential distributions is used in
service demand modeling
- The hyper-exponential distribution is obtained
by selecting from a mixture of several exponential distributions.
- The simplest variant has only two stages:
π1 π2 π 1 β π π1 β π2
π π¦ = ππ β ππ β ππππ¦
π π=1
0 β€ ππ β€ 1 0 β€ π1, π2π β€ 1
Coefficient of Variation
- Coefficient of variation is defined as follows
π·π€ = π π
- Where
β π is the standard deviation β π is the mean
Long-lived vs short-lived Jobs
- Long-lived processes
- short-lived processes
π π > π β π½2, π½ β 1
- This means that most processes are short, but
a small number are very long.
Coefficient of Variation
- Coefficient of variation is defined as follows
π·π€ = π π
- Where
β π is the standard deviation β π is the mean
Performance Measurements
- Classical performance metrics:
β Response time, β throughput, β scalability, β resource/cost/energy, β efficiency, β Elasticity β Availability, β reliability, and β security β SLA violation
References
- Thyagaraj Thanalapati, Sivarama P. Dandamudi:
An Efficient Adaptive Scheduling Scheme for Distributed Memory Multicomputers. IEEE Trans. Parallel Distrib. Syst. 12(7): 758-768 (2001)
- A. Iosup and D.H.J. Epema, Grid Computing
Workloads, IEEE Internet Computing 15(2): 19-26 (2011)
- Feitelson DG. Workload modeling for computer